Semi-Supervised Learning for Natural Language Processing
نویسندگان
چکیده
The amount of unlabeled linguistic data available to us is much larger and growing much faster than the amount of labeled data. Semi-supervised learning algorithms combine unlabeled data with a small labeled training set to train better models. This tutorial emphasizes practical applications of semisupervised learning; we treat semi-supervised learning methods as tools for building effective models from limited training data. An attendee will leave our tutorial with 1. A basic knowledge of the most common classes of semi-supervised learning algorithms and where they have been used in NLP before. 2. The ability to decide which class will be useful in her research. 3. Suggestions against potential pitfalls in semisupervised learning.
منابع مشابه
Special semi-supervised techniques for Natural Language Processing tasks
A labeled natural language corpus is often difficult, expensive or time-consuming to obtain as its construction requires expert human effort. On the other hand, unlabelled texts are available in abundance thanks to the World Wide Web. The importance of utilizing unlabeled data in machine learning systems is growing. Here, we investigate classic semi-supervised approaches and examine the potenti...
متن کاملSemi-supervised Classification for Natural Language Processing
Semi-supervised classification is an interesting idea where classification models are learned from both labeled and unlabeled data. It has several advantages over supervised classification in natural language processing domain. For instance, supervised classification exploits only labeled data that are expensive, often difficult to get, inadequate in quantity, and require human experts for anno...
متن کاملScalable Graph-Based Learning Applied to Human Language Technology
Scalable Graph-Based Learning Applied to Human Language Technology Andrei Alexandrescu Chair of the Supervisory Committee: Associate Research Professor Katrin Kirchhoff Electrical Engineering Graph-based semi-supervised learning techniques have recently attracted increasing attention as a means to utilize unlabeled data in machine learning by placing data points in a similarity graph. However, ...
متن کاملInvited Talk: Domain-adaptation of Natural Language Processing Tools for RE
Natural language processing tools like part-of-speech taggers and parsers are being used in a variety of applications involving natural language, including RE. Such tools, based on statistical models of language, are learnt via supervised machine learning algorithms from human-annotated data. Due to their dependence on annotated data, which is limited in size and genre, these models have a fall...
متن کاملTutorial on Inductive Semi-supervised Learning Methods: with Applicability to Natural Language Processing
Supervised machine learning methods which learn from labelled (or annotated) data are now widely used in many different areas of Computational Linguistics and Natural Language Processing. There are widespread data annotation endeavours but they face problems: there are a large number of languages and annotation is expensive, while at the same time raw text data is plentiful. Semi-supervised lea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008